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Abstract 

Until recently, the use of Bayesian inference in population genetics was lim- 
ited to a few cases because for many realistic population genetic models the 
likelihood function camiot be calculated analytically . The situation changed 
with the advent of likelihood-free inference algorithms, often subsumed under 
the term Approximate Bayesian Computation (ABC). A key innovation was 
the use of a post-sampling regression adjustment, allowing larger tolerance 
values and as such shifting computation time to realistic orders of magnitude 
[1]. Here we propose a reformulation of the regression adjustment in terms of a 
General Linear Model (GLM). This allows the integration into the sound the- 
oretical framework of Bayesian statistics and the use of its methods, including 
model selection via Bayes factors. We then apply the proposed methodology 
to the question of population subdivision among western chimpanzees Pan 
troglodytes verus. 

Introduction 

With the advent of ever more powerful computers and the refinement of algorithms 
like MCMC or Gibbs sampling, Bayesian statistics has become an important tool 

for scientific inference during the past two decades. Until recently many scientists 
shunned Bayesian methods - mainly because of the philosophical problems related 
to the choice of prior distributions - but the development of hierarchical and em- 

* These two authors contributed equally to this work 

tEcole d'ingenieurs de Pribourg, Bd. de Perolles 80, 1705 Pribourg, Switzerland, 
christoph.leuenberger@eif.ch 

* University of Berne, Computational and Molecular Population Genetics Laboratory, 3012 
Berne, Switzerland, daniel.wegmann@zoo.unibe.ch 



1 



pirical Bayes turned them into an alternative even for hard-core frequentists (see 
e.g. [22] for a discussion of these issues). 

Consider a model A4 creating data V (DNA sequence data, for example) determined 
by parameters 6 from some (bounded) parameter space 11 c whose joint prior 
density wc denote by tt{6). The quantity of interest is the posterior distribution of 
the parameters which can be calculated by Reverend Bayes' golden rule 

7T{e\'D) = c-fM{'D\e)7r{e), 

where /(I?|0) is the likelihood of the data and c is a normalizing constant. Direct use 
of this formula, however, is often thwarted by the fact that the likelihood function 
cannot be calculated analytically for many realistic population genetic models. In 
these cases one is obliged to have recourse to stochastic simulation. Tavare et al. 
[24] propose a rejection sampling method for simulating a posterior random sample 
where the full data V is replaced by a summary statistics s (like the number of 
segregating sites in their setting). Even if the statistics are not sufficient for V - that 
is, the statistics do not capture the full information contained in the data -, rejection 
sampling allows for the simulation of approximate posterior distributions of the 
parameters in question (the scaled mutation rate in their model). This approach was 
extended to multiple-parameter models with multivariate summary statistics s = 
(si, . . . , Sn)^ by Weiss and von Haeseler [27]. In their setting a candidate vector 6 of 
parameters is simulated from a prior distribution and is accepted if its corresponding 
vector of summary statistics is sufficiently close to the observed summary statistics 
Sobs with respect to some metric in the space of s, i.e. if dist(s, Sobs) < e for a 
fixed tolerance e. If we suppose that the likelihood /x(s|0) of the full model is 
continuous and non-zero around Sgbs then the likelihood of this truncated model 
■M.e{sobs) obtained by this accept-reject process is given by 

/,(s|0) = Ind(seB,(sob,))-/M(s|0)-(/ fM{s\e)ds)-^ (1) 

where Be = Be{sobs) = {s G M"|dist(s, Sob^) < e} is the e-ball in the space of 
summary statistics and Ind(-) is the indicator function. Observe that fc{s\6) de- 
generates to a (Dirac) point measure centered at Sobs as e ^ 0. If the parameters 
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are generated from a prior -jt{6) then the distribution of the parameters retained 
after the rejection process outlined above is given by 

We shall call this density the truncated prior. It is not hard to check that 

InfM{sobs\eMe)de j^Msobs\e)TT,{e)d9- ^ ' 

Thus the posterior distribution of the model for s = Sobs given the prior it {9) is 
exactly equal to the posterior distribution of the truncated model Mc{sobs) given 
the truncated prior tt^ {6) . If wc can estimate the truncated prior and make an edu- 
cated guess for a parametric statistical model of Mf {sobs), wc arrive at a reasonable 
approximation of the posterior Tr{0\sobs) even if the likelihood of the full model M. 
is unknown. It is to be expected that due to the localization process the truncated 
model will exhibit a simpler structure than the full model M. and thus be easier to 
estimate. 

Estimating tTc{9) is straightforward, at least when the summary statistics can be 
sampled from A1 in a reasonable amount of time: Sample the parameters from the 
prior tt{9), create their respective statistics s from A4 and save those parameters 
whose statistics lie in Be{sobs) in a list V = {9^ , ... ,9^}. The empirical distri- 
bution of these retained parameters yields an estimate of 7re(0). If the tolerance 
e is small then one can assume that /x(s|0) is close to some (unknown) constant 
over the whole range of Be{sobs)- Under that assumption, formula (3) shows that 
n{9\sobs) ~ '^e{9). However, when the dimension n of summary statistics is high - 
and for more complex models dimensions like n = 50 are not unusual -, the "curse 
of dimensionality" raises its ugly head: The tolerance must be chosen rather large or 
else the acceptance rate becomes prohibitively low. This, however, distorts the pre- 
cision of the approximation of the posterior distribution by the truncated prior (see 
[26]). This situation can be partially alleviated by speeding up the sampling pro- 
cess; such methods are subsumed under the term approximate Bayesian computation 
(ABC). Marjoram et al. [16] develop a variant of the classical Metropolis-Hastings 
algorithm (termed ABC-MCMC in [23]) which allows them to sample directly from 
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the truncated prior 7re(0). In [23] a sequential Monte Carlo sampler (ABC-PRC) is 
proposed requiring substantially less iterations than ABC-MCMC. But even when 
such methods are applied, the assumption that /x(s|0) is constant over the e-ball 

is a very rough one, indeed. 

In order to take into account the variation of within the e-ball, a post- 

sampling regression adjustment (ABC-REG) of the sample V of retained parameters 
is introduced by Beaumont et al. [1]. Basically, they postulate a (locally) linear 
dependence between the parameters 6 and their associated summary statistics s. 
More precisely, the (local) model they implicitly assume is of the form = Ms + 
mo + e, where M is a matrix of regression coefficients, mo a constant vector and 
e a random vector of zero mean. Computer simulations suggest that for many 
population models ABC-REG yields posterior marginal densities that have narrower 
HPD (highest posterior density) regions and are more closely centered around the 
true parameter values than the empirical posterior densities directly produced by 
ABC-samplers ([26]). From a point of view of statistical modeling, however, it 
is unusual to take the parameters d as endogenous and the summary statistics 
s as exogenous variables. As a consequence, this regression adjustment does not 
take into account the prior distribution; it can lead to misspecified posteriors, in 
particular, it may yield posteriors that are non-zero in parameter regions where 
the priors actually vanish (see Figure IB for an illustration of this phenomenon). 
Moreover, it is not clear how ABC-REG could yield an estimate of the marginal 
density of model M. at So^g, an information that is crucial for model comparison. 

One can overcome these drawbacks by stipulating for M.c{sobs) a General Linear 
Model (abbreviated as GLM in the literature - not to be confused with the Gen- 
eralized Linear Models which unfortunately share the same abbreviation.) To be 
precise, we assume the summary statistics s created by the truncated model's like- 
lihood /e(s|0) to satisfy 

s|6> = C6l-hco + e, (4) 
where C is a n x m-matrix of constants, cq a n x 1-vector and e a random vector 
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with a multivariate normal distribution of zero mean and covariance matrix S^: 

e~A^(0,S,). 

A GLM has the advantage to take into account not only the (local) linearity, but 
also the strong correlation normally present between the components of the sum- 
mary statistics. Of course, the model assumption (4) can never represent the full 
truth since its statistics are in principle unbounded whereas the likelihood fe{s\0) is 
supported on the e-ball around Sgbs- But since the multivariate Gaussians will fall 
off rapidly in practice and not reach far out off the boundary of Bc {sobs) this is to 
be a disadvantage we can live with. In particular, the OLS-estimate outlined below 
implies that for e — > the constant cq tends to Sgbs whereas the design matrix C 
and the covariance matrix both vanish. This means that in the limit of zero 
tolerance e = our model assumption yields the true posterior distribution of A4. 

Theory Section 

In this section we describe the above methodology - referred to as ABC-GLM in 
the following - in more detail. The basic two-step procedure of ABC-GLM may be 
summarized as follows: 

GLMl Given a model A4 creating summary statistics s and given a value of 
observed summary statistics Sobs, create a sample of retained parameters 
9^, j = 1,...,N, with aid of some ABC-sampler (ABC-REJ, ABC-MCMC or 
ABC-PRC) based on a prior distribution tt{6) and some choice of the tolerance 
e>0. 

GLM2 Estimate the truncated model Aiei^obs) as a General Linear Model and 
determine, based on the sample 6-', from the truncated prior ^^{6) an ap- 
proximation the posterior Tr{d\Sobs) according to formula (3). 

Let us look more closely at these two steps: 

GLMl: ABC-Scimpling. We refer the reader to [16] and [23] for details con- 
cerning ABC-algorithms and to [17] for a comprehensive review of computational 
methods for genetic data analysis. Let us just add a few words: Summary statistics 
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tend to be highly correlated in practice. In that case, the best choice for a distance 
function used for the rejection condition dist(s,So6s) < ^ is the Mahalanobis dis- 
tance (see e.g. [21] for a definition). As an alternative, we recommend to reduce the 
number of summary statistics by a Principal Components Analysis (PCA). PCA 
de-corrclates the statistics and the Mahalanobis distance is then identical to the 
standard L^-distance. Moreover, PCA has the advantage of reducing the dimen- 
sion of the space of summary statistics. Enough principal components should be 
retained, though, to keep the potential loss of information tolerable. A more sophis- 
ticated method of reducing the dimension of summary statistics, based on Partial 
Least Squares (PLS), in described in [26]. 

To fix the notation, let V = be a sample of vector- valued param- 

eters created by some ABC-algorithm simulating from some prior tt{0), and S = 
{s^, .... s^^} the sample of associated summary statistics produced by the model M. 
Each parameter G-' is an m-dimensional column vector = {9^ , . . . , 6*^)^ and each 
summary statistics an n-dimensional column vector s-' = (s-}, . . . ,5^)^ ^ 'Se(so6s). 
The samples V and <S can thus be viewed as mx N- and n x A''- matrices P and S, 
respectively. 

The empirical estimate of the truncated prior 7re(0) is given by the discrete distri- 
bution that puts a point mass of 1/N on each value 6-' G V. We smoothen out this 
empirical distribution by placing a sharp Gaussian peak over each parameter value 
O-' . More precisely, we set 

where 

ct>{e - d^, s,) = ^^^^^ exp(-l(0 - dYi:^\e - e^) 

and 

= diag((7i,...,c7„j) 

is the covariance matrix of cp which determines the width of the Gaussian peaks. 
The larger the number N of sampled parameter values, the sharper the peaks can 
be chosen in order to still get a rather smooth tt^. If the parameter domain 11 
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is normalized to [0,1]™, say, then a reasonable choice is cjk = 1/N. Otherwise, 
Uk should be adapted to the parameter range of the parameter component 6k- 
Too small values of ak will result in wiggly posterior curves, too large values might 
unduly smear out the curves. The best advice is to run the calculations with several 
choices for Sg. 

GLM2: GenerEil Lineeir Model. As explained in the introduction, we assume 
the truncated model Me{sobs) to be normal hnear, i.e. the random vectors s satisfy 

to (4). The covariaiicc matrix S,, encapsulates the strong correlations normally 
present between the components of the summary statistics. C, Cq and Ylg can be 
estimated by standard multivariate regression analysis from the sample V , S created 
in step GLMl^ To be specific, set X = (1|P*), where 1 is an TV x 1-vector of I's. 
C and Co are determined by the usual least squares estimator 

(co|C) =SX(X*X)-i, 
and for we have the estimate 

Sg = — R*R, 

N — m 

where R = S* — X • (co|C)* are the residuals. The likelihood for this model - 
dropping the hats on the matrices to simplify the notation - is given by 

/e(s|0) = |27rS,|-V2 . exp (^-^(s - C0 - co)* 5];^ (s - C0 - cq)) . (6) 

An exhaustive treatment of linear models in a Bayesian (econometric) context is 
given in Zellner's book [29]. 

Recall from (3) that for a prior 7r(0) and an observed summary statistic Sobs, the 
parameter's posterior distribution for our full model M. is given by 

TT{d\Sobs)=C- h{Sobs\e)TTe{e). (7) 

^Strictly speaJjing, one must redo an ABC-sample from uniform priors over 11 in order to get 
an unbiased estimate of the GLM if the prior 7r(0) is not uniform already. On the other hand, 
ordinary least squares estimators are quite insensitive to the prior's influence. In practice, one can 
as well use the sample V to to the estimate. 
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where fe{sobs\d) is the likelihood of the truncated model A4e{sobs) given by (6) and 
ne{9) is the estimated (and smoothed) truncated prior given by (5). 
Performing some matrix acrobatics (see e.g. the proof of Lemma 2.1 in [15]) one 
can show that the posterior (7) is - up to a multiplicative constant - of the form 
Eiljexp(-iQj) where 



Qj = {e-t^yT-\d-t3) + isobs-coy's;\sobs-co) + ... 



Here T, t-' and v-' are given by 



T= (C*S7^C + S^^) ^ 



(8) 



and t-' = Tv-', where 



vJ' = c*s;i(s„6,-co) + s,-i0^'. 



(9) 



From this we get 



n{e\sobs) cx J2c{e^) exp ( - -(0 - t^fT-\d - t^) ) , 
i=i ^ ^ 



where 



c(6>*) = exp 



^ ((0^)*S,-i0^- - (v^^Tv^) 



(10) 



(11) 



When the number of parameters exceeds two, graphical visualization of the posterior 

distribution becomes impractical and marginal distributions must be calculated. 
The marginal posterior density of the parameter 0^ is defined by 



Tr{ek\s)= [ n{e\s)dd-k, 



where integration is performed along all parameters except 6k. 

Recall that the marginal distribution of a multivariate normal Afi^fJ-, S) with respect 

to the fc-th component is the univariate normal density M{^ik, (^k,k)- Using this fact, 
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it is not hard to show that the marginal posterior of parameter Ok is given by 

n{ek\sobs)=a-J2c{e^)e^p . (12) 

where Tk^k is the fc-th diagonal element of the matrix T, tj, is the fc-th component 
of the vector P , and c(9-') is still determined according to (11). The normalizing 
constant a could, in principle, be determined analytically but is in practice more 
easily recovered by a numerical integration. Strictly speaking, the integration should 
only be done over the bounded parameter domain 11 and not over the whole of IR™. 
But this no longer allows for an analytic form of the marginal posterior distribution. 
For large values of N the diagonal elements in the matrix Eg can be chosen so small 
that the error is in any case negligible. 

Model selection. The principal difficulty of model selection methods in non- 
parametric settings is that it is nearly impossible to estimate the likelihood of A4 at 
Sobs due to the high dimension of the summary statistics ( "curse of dimensionality" ) ; 
see [2] for an approach based on multinomial logit. Parametric models on the other 
hand lend themselves readily to model selection via Bayes factors. Given the model 
M, one must determine the marginal density 

fMM= ! f{sobs\eMd)dd. 

Jn 

It is easy to check from (1) and (2) that 

fM{Sobs) = AiSobs,-!^) ■ / fe{Sobs\d)ne{0)de. 

Jn 

Here 

A,{sobs,7T) := [ Tr{e) [ fM{s\d)dsdd (13) 
Jn Jb^ 

is the acceptance rate of the rejection process. It can easily be estimated with 

aid of ABC-REJ: Sample parameters from the prior it{6), create the corresponding 
statistics s from M. and count what fraction of the statistics fall into the e-ball 
centered at Sobs- 
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If we assume the underlying model of A4e{sobs) to be our GLM then the marginal 
density of A4 at Sgbs can be estimated as follows: 



fMM = ^[^^T.'^P ^-l{s^bs-m^fT)-\sobs-m^)) (M) 
where the sum runs over the parameter sample V = {6^, . . . , 6^}, 

D = S, + CSeC^ 

and 

m^' =co + C6>^'. 

For two models Ma and Mb with prior probabihties wa and ttb = 1 — tta, the 
Bayes factor Bab in favor of model Ma over model Mb 

Bab ~ -^-^^^^"^^^ (^15^ 

fMsi^obs) 

where the marginal densities /ma ^^'^ Jmb ^'^^ calculated according to (14). The 
posterior probability of model Ma '^ 

HI \ A \ \ Babt^a 
f{MA\Sobs) = 



BabT^A + TTs 



Simulation Section 

A toy model. In Figure 1 we present the comparison of posteriors obtained with 
rejection sampling, ABC-REG and ABC-GLM, with those determined analytically 
("true posteriors"). As a toy model we inferred the population- mutation parameter 
= ANfj, from the number of segregating sites 5 of a sample of sequences with 
lO'OOO bp for different observed values and tolerance levels. Estimations are always 
based on 5000 simulations with dist(S', ^ob^) < e, and we report the average of 25 
independent replications per data point. Estimation bias of the different approaches 
was assessed by computing the Li-distance between the inferred posterior and the 
true one obtained from analytical calculations using the likelihood function intro- 
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duced by Watterson [30]. Recall that the Li-distance of two (continuous) densities 
f{e) and g{e) 

\\f-9\\i^ I \m~9mde 

measures the area between the function curves. It is equal to 2 when / and g have 
disjoint supports and it vanishes when the functions are identical. 
When we used a uniform prior 6 ^ Unif([0.005, 10]) (Figure lA to C), both ABC- 
REG and ABC-GLM give comparable results and improve the posterior estimation 
compared to the simple rejection algorithm except for very low tolerance values e 
where the rejection algorithm is expected to be very close to the true posterior. 
The average Li-distance over all observed data sets and tolerance values e are 
0.4726, 0.2609 and 0.1814 for the rejection algorithm, ABC-REG and ABC-GLM, 
respectively. Note that perfect matches between the approximate and the true 
posteriors are difficult to obtain because all approximated posteriors depend on 
a smoothing step which may not give accurate results close to boarders of their 
supports. In order to have a fair comparison, we adjusted the smoothing parameters 
(bandwidths) such as to get the best results for both approaches. 




observed number of segregating sites S 



Figure 1: Comparison of rejection (A and D), ABC-REG (B and E) and ABC-GLM 
(C and F) posteriors with those obtained from analytical likelihood calculations. Shades 
indicate the distance between inferred and analytically calculated posterior (see text). 
White corresponds to an exact match (zero distance) and darker grey shades indicate larger 
distances. If the the inferred posterior differs from the analytical more than the prior does, 
squares are marked in red. Shown are averages over 25 independent estimations. 

However, when we used a discontinuous prior Unif( [0.005, 3] U [6, 10]) with an 
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- admittedly quite artificial - "gap" in the middle, we observed a quite distinct 
pattern (see Figure 2 and Figure ID to E). One clearly recognizes that posteriors 
inferred with ABC-REG are frequently misplaced and often even further away from 
the true posterior (in Li-distance) than the prior. The reason for this is that in 
the regression step of ABC-REG parameter values may easily be shifted outside the 
prior support. This behavior of ABC-REG has been observed earlier and as an ad 
hoc solution Hamilton et al. [12] proposed to transform the parameter values prior 

to the regression step by a transformation of the form y = —ln{tan{- 

where a and b are the lower and upper borders of the prior support interval. For 
more complex priors - like the discontinuous prior used here - this transformation 
may not work. In fact, we applied this transformation whenever we performed 
ABC-REG. By adding additional borders, smoothing introduces a larger bias (see 
Figure 2). Still, ABC-GLM is much less affected by the "gap" prior than ABC- 
REG. The average Li-distance over all observed data sets and tolerance values e 
are 0.4412, 0.4925 and 0.1876 for the rejection algorithm, ABC-REG and ABC- 
GLM, respectively. 

Example posteriors with Sobs = 16 based on 5000 simulations with dist(S', Sobs) < 10 
are shown in Figure 2. While the two approximated posteriors (ABC-REG and 
ABC-GLM) are very similar when the priors are uniform over the whole range 
(Figure 2A), they become markedly different when we use a uniform prior with a 
"gap" between 3 and 6 (Figure 2B). Note that the posterior estimated with ABC- 
REG is max;imal in the forbidden region! This phenomenon clearly illustrates that 
ABC-REG is not consistent with the prior distribution. 

Application to chimpanzees. In standard taxonomies, chimpanzees, the closest 
living relatives of humans, are classified into two species: the common chimpanzee 
{Pan troglodytes) and the bonobo {Pan paniscus). Both species are restricted to 
Africa and diverged roughly 9 MYA ago [28, 4]. The common chimpanzees are 
further subdivided into three large populations or subspecies based on their separa- 
tion by geographic barriers. Among them, the western chimpanzees {P. troglodytes 
verus) form the most remote group. Interestingly, recent multilocus studies found 
consistent levels of gene flow between the western and the central {P. t. troglodytes) 
chimpanzees [28, 4]. Nonetheless, a recent study of 310 microsatellites in 84 com- 
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Figure 2: Posterior estimates using ABC-GLM and ABC-REG for Sobs = 16 based on 5000 
simulations with dist(S', Sobs) < 10 using a discontinuous prior 9 ~ Unif([0.005, 3] U [6, 10]). 



mon chimpanzees supports a clear distinction between the previously labeled pop- 
ulations [5]. Using a PC A analysis, indication for substructure within the western 
chimpanzees was found in the same study. 

To demonstrate the applicability of the model selection given in the Theory Section 
we contrast two different models of the western chimpanzee population with this 
data set: a model of a single panmictic population with constant size and a finite 
island model of constant size and constant migration among demes. While we 
estimated 6 = 2Ngfi, priors were set on A^^ and /i separately with logio{Nf.) 
Unif([3,5]) and ~ A/'(5 • 10-^,2 • 10"^) truncated on /x e [10"^, IQ-^]. In the 
case of the finite island model we had an additional prior ripop ^ Unif([10, 100]) on 
the number of islands, and individuals were attributed randomly to the different 
islands. 

We obtained genotypes for all 50 individuals reported to be of western chimpanzee 
origin from the study of Becquet et al. [5] excluding captive born hybrids. We 
checked the mutation pattern for each individual locus, and all alleles not match- 
ing the pattern were set as missing data. A total of 265 loci [5] were used, after 
removing the loci on the X and Y chromosome as well as those being monomorphic 
among the western chimpanzees. All simulations were performed using the soft- 
ware SIMC0AL2 [14] and we reproduced the pattern of missing data observed in 
the dataset. Using the software package ArlequinS.O [6] we calculated two summary 
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statistics on the dataset: the average number of alleles per locus, K, and Fjs, the 
fixation index within the western chimpanzees. We performed a total of 100,000 
simulations per model. 

In Figure 3 we report the Bayes factor of the island model according to (3) for 
different acceptance rates A^, see (13). While there is a large variation for very 
small acceptance rates, the Bayes factor stabilizes for > 0.005. Note that A^ < 
0.005 corresponds to less than 500 simulations, and that the ABC-GLM approach, 
based on a model estimation and a smoothing step, is expected to produce poor 
results since the estimation of the model parameters is unreliable due to the small 
sample size. The good news is that the Bayes factor is stable over a large range of 
tolerance values. We may therefore safely reject the panmictic population model in 
favor of population subdivision among western chimpanzees with a Bayes factor of 




o - 

\ \ \ 1 

0.0005 0.005 0.05 0.5 

acceptance rate 

Figure 3: Bayes factor for the island relative to the panmictic population model for 
different acceptance rates (logarithmic scale). For very low acceptance rates we observe 
large fluctuations whereas the Bayes factor is quite stable for larger values. Note that 
Ac < 0.005 corresponds to < 500 simulations, too small a sample size for trustful statistical 
model estimation. 



Discussion 

Due to still increasing computation power it is nowadays possible to tackle esti- 
mation problems in a Bayesian framework for which analytical calculation of the 

likelihood is inhibited. In such cases, Approximate Bayesian Computations are of- 
ten the choice. A key innovation in speeding up such algorithms was the use of a 
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regression adjustment, termed ABC-REG in this note, which used the frequently 
present linear relationship between generated summary statistics s and parameters 
of the model in a neighborhood of the observed summary statistics Sobs [!]• The 
main advantage is that larger tolerance values e still allow to extract reasonable 

information about the posterior distribution tt{6\s) and hence less simulations are 
required to estimate the posterior density. 

Here we present a new approach to estimate approximate posterior distributions, 
termed ABC-GLM, similar in spirit to ABC-REG, but with two major advantages: 
First, by using a GLM to estimate the likelihood function, ABC-GLM always con- 
sistent with the prior distribution. Secondly, our ABC-GLM approach is naturally 
embedded into a standard Bayesian framework, which in turn allows the application 
of well-known Bayesian methodologies such as model averaging or model selection 
via Bayes factors. 

ABC-GLM is compatible with any type of ABC-sampler, including likelihood-free 
MCMC [16], [3]. Also, more complicated regression regimes taking non-linearity 
or heteroscedacity into account may be envisioned when the GLM is replaced by 
some more complex model. A great advantage of the current GLM-setting is its 
simplicity which renders implementation in standard statistical packages feasible. 
Application to chimpeinzees. We showed the applicability of the model selection 
procedure via Bayes factors by opposing two different models of population struc- 
ture among the western chimpanzees Pan troglodytes verus. Our analysis strongly 
suggests population substructure within the western chimpanzees since an island 
model is significantly favored over a model of a panmictic population. While none 
of our simple models is thought to mimic the real setting exactly (models never 
do!), we still believe that they capture the main characteristics of the demographic 
history influencing our summary statistics, namely the number of alleles K and the 
fixation index Fjs- While the observed Fjs of 2.6% has been attributed to inbreed- 
ing previously [5] , we propose that such values may easily arise if diploid individuals 
are sampled in a randomly scattered way over a large, substructured population. 
While it was almost impossible to simulate the value Fjs = 2.6% in the model of 
a panmictic population, it easily falls within the range of values obtained from an 
island model. 
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